Market structure explained by pairwise interactions 
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Financial markets are a typical example of complex systems where interactions between con- 
stituents lead to many remarkable features. Here we give empirical evidence, by making as few 
assumptions as possible, that the market microstructure capturing almost all of the available in- 
formation in the data of stock markets does not involve higher order than pairwise interactions. 
We give an economic interpretation of this pairwise model. We show that it accurately recovers the 
empirical correlation coefficients thus the collective behaviors are quantitatively described by models 
that capture the observed pairwise correlations but no higher-order interactions. Furthermore, we 
show that an order-disorder transition occurs as predicted by the pairwise model. Last, we make the 
link with the graph-theoretic description of stock markets recovering the non-random and scale-free 
topology, shrinking length during crashes and meaningful clustering features as expected. 

PACS numbers: 89.65.Gh, 89.75.Fb, 64.60.Cn, 64.60.De 



I. INTRODUCTION 

Complex systems are particularly interesting because 
they exhibit very sophisticated behaviors caused by, a 
priori, simple rules. Indeed, magnetic materials and neu- 
ral networks, for instance, have some striking features 
such as phase transitions, memory, complicated equilib- 
ria structures and clustering. It is remarkable that these 
properties are caused by such simple interactions as pair- 
wise ones. We believe that the markets are also driven 
by such simple rules and that the higher-order interac- 
tions encountered in financial systems are the pairwise 
ones. Typical characteristics of a complex system are 
numerous entities and interaction rules (with a degree of 
non-linearity), all leading to the emergence of collective 
behaviors. Those behaviors in general depend more on 
the interactions (c.g their scaling and their order) and 
their effects than on the intrinsic nature of the elemen- 
tary constitutive entities taken individually. The market 
can be viewed as such a system. The entities can be 
stocks or traders interacting through non-obvious rules. 
We note that we should interpret interaction at the larger 
sense of mutual or reciprocal influence. 

What one knows is that the markets exhibit features 
such as synchronization structural reorganization J0, 
[|[ , power laws 4, 5] , hierarchical and non-randomness Q . 
What one does not know is the true market dynamics. 
Even if trading rules are known, microscopic equations of 
motion are not known. This is a fundamental difference 
between finance and physics (or neuroscience). 

A natural approach, given the above considerations, 
is a statistical modeling collecting and using at best the 
available amount of information and allowing (in a cer- 
tain sense) the emergence of critical properties. This is 
exactly the purpose of the maximum entropy modeling in 
complex systems theory. Indeed the maximum entropy 
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principle (MEP) allows the selection of the less restrict- 
ing model on the basis of incomplete information Q • We 
choose this data-based approach to avoid the use of any 
particular microscopic schemes (e.g. trader-agent-based 
rules, a priori unknown) which are difficult to assess ex- 
perimentally or to avoid any analogy (even if some of such 
models are valuable Q). The reason is that, even if one 
does not know the underlying microscopic processes, the 
macroscopic collective behaviors can still be described by 
an effective model. One has long experience of this pow- 
erful approach in the description of phase transitions and 
magnetic materials . More recently, it has led to valu- 
able results about the description of real neural networks 
[Toj . Moreover, this approach also has counterparts in 
economics. Indeed, in addition of the statistical mean- 
ing of the entropy, one can interpret it as a measure of 
the economic activity [ll[ and it is linked to the central 
concept of utility of many interacting economic entities 

An important outcome of such a modeling is a conve- 
nient simplified version of the real interaction structure 
that is still consistent with the data. In the following, 
we derive the model in this point of view and we study 
the structural properties of the resulting complex net- 
work. The critical properties will be investigated in an- 
other work. 

The paper is organized as follow. In section [HI we 
present the model, its economic interpretation and the 
link between the interaction matrix and the moments. In 
section Hm we give evidence that the information embed- 
ded in the data is mostly explained by the pairwise but 
no higher-order interactions. In section IIVI we show an 
order-disorder transition through actual data. In section 
IVl we highlight the properties of the interaction matrix 
and its link to the crises. Finally, in section I VII we ex- 
plain the link with the graph-theoretic approach and the 
topological evolution of the market network. 
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II. THE MODEL 



Model derivation 



The aim is to set up a statistical model describing the 
market state. This requires a way to infer the proba- 
bility distribution in order to get the observables (here, 
the associated moments). The model will also allow the 
study of the market structure. All these quantities will 
be defined below. We consider a set of N market in- 
dices or N stocks with binary states Sj (sj = ±1 for all 
i = 1, • • ■ , N). A system configuration will be described 
by a vector s = (si, • • • , sjv). The binary variable will 
be equal to 1 if the associated index is bullish and equal 
to —1 if not. A configuration (si, • • ■ , sjv) is a binary 
version of the index returns. One knows that this ap- 
proximation is already useful in the description of neural 
populations [l(| and that neural networks are similar to 
financial networks . We may think that it will also be 
the case in finance; this will be justified a posteriori as 
the model gives consistent results. 

We seek to establish a less structured model explain- 
ing only the measured index mean orientation qi = (sj) 
and instantaneous pairwise correlations qki = (skSi). The 
brackets (•} denote the average with respect to the un- 
known distribution p(s). As the entropy of a distribu- 
tion measures the randomness or the lack of interaction 
among the binary variables, a way to infer such probabil- 
ity distribution knowing the mean orientations and the 
correlations is the maximum entropy principle. Jaynes 
showed how to derive the probability distribution us- 
ing the maximum entropy principle [14| ; for supplemen- 
tary information see Q . It consists in the following con- 
strained maximization: 



maxS(s) = — Yjp(s) logp(s) 



(1) 



{s} 



s.t ^2p(s) = 1, Y^P( s ) s i = 9i, ^2p( s ) s i s j = Qij 

{s} {s} {s} 



Thus preferences are conjugated to mean orientations 
and interaction strengths to pairwise correlations. In- 
cluding higher-order correlations in constraints in ([T]) 
could bring more information and thus decrease the max- 
imum entropy. We will show below that this will not be 
the case. 

The Gibbs distribution ^ is similar to the one given 
by Brock and Durlauf in the discrete choice problem (l2j 
and to the one in stochastic models in macroeconomics 



111 ], and also to the Ising model used in description of 
magnetic materials and neural networks 0, llOll . It is 
also a special case of Markov random fields It is 

to be noted that the Gibbs distribution and Shannon 
entropy naturally arise from the stochastic modeling in 
economics; this is discussed in (Tlj . 

We obtain the parameters {Jij,hi} by performing ex- 
plicitly the maximization ([1]) so that the theoretical mo- 
ments (sj) and (siSj) match the measured ones qi and q,^. 
We note that this requires the computation of 2 N terms. 
If this number is large, the computation will take a while 
and we can benefit from one of the methods described in 

0. 

Last, we show how the cumulants are obtained from 
this model and their relation to interaction strengths. 
As the statistical model ((2]) is expressed as a Gibbs dis- 
tribution, we have the relations 



{s %1 ...s tN ) c = d \nZ/dh il ...dh 



(4) 



where {-) c is the cumulant average [17|; it gives the 
relation between J and the correlation functions. If the 
partition function Z cannot be explicitly computed, we 
can use the Plefka series [l8| or a variational cumulant 
expansion [l9j . 

Hereafter, we will show that the covariances are consis- 
tently deduced from this statistical model and thus that 
they are a function of the interaction strengths. 



B. Interpretation 



The resulting two-agents distribution P2{s) is the fol- 
lowing 



p 2 (s) = Z 



f I N N \ 

1 exp 2 J v SiS i + zJ hlSl 
\ id <=i / 



-H(s) 



(2) 

where Jtj and hi arc Lagrange multipliers and Z a 
normalizing constant (the partition function). They can 
be expressed in terms of partial derivatives of the entropy 



dS(s) dS(s) 



J A, 



(3) 



We interpret the objective function H(s) defined by 
the MEP in the distribution <J2J) as follows. The pairwise 
interactions between economic agents are modeled by in- 
teraction strengths Jij which describe how i and j influ- 
ence each other. Here by interaction, we mean a measure 
of mutual influence or a measure of share comovement. In 
this framework, our intention is not to give a description 
of these interactions but to study their effects. Actu- 
ally, the causes underlying the interaction process seem 
to be unnecessary in the description of emergent macro- 
scopic behaviors. Indeed the complicated interactions 
between magnetic moments or between neurons are effi- 
ciently simplified in their maximum entropy description 
but one still recovers the main macroscopic features ob- 
served in these systems. In this description, the crucial 
features are the scaling (dependence on or independence 
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of the system size) of interaction strengths and the or- 
der of interactions. The matrix J is set to be symmetric 
in this first approach. There is disagreement or conflict 
between entities when the weighted product of their ori- 
entations JijSiSj is negative. If two shares are supposed 
to move together (Jjj > 0), a conflicting situation is the 
one where they do not have the same orientation (bearish 
or bullish). 

We include the idiosyncratic preferences of the eco- 
nomic agents, here the willingness to be bullish or not. 
These Lagrange multipliers hi can also be interpreted 
as the external influences on entities i induced by the 
macroeconomic background. By example a company can 
prosper and make benefits during a crisis period and the 
associated stock can still fall simultaneously because the 
investors are negatively influenced by the economic back- 
ground. It results that the stock will have a propensity 
to fall. We denote the external influence by hi- If i's 
orientation satisfies its preference, hiSi is positive. The 
total conflict of the system is thus given by 



our binary states. We observe 2253 configurations from 
6/06/2002 to 14/06/2011 [2l|. We take a nine year long 
time series including two large crises. The daily sampling 
is enough since we want to study large crises, and the two 
principal peaks of the Fourier transform are centered on 
frequencies fx = 6 x 10 _4 d _1 and / 2 = 1-2 x 10 -3 d -1 ; 
the unit day stands for trading day. The first frequency 
fi is the crisis occurrence frequency in our time window, 
the corresponding period is T\ = 1.7 x 10 3 d . Later, 
we will also analyze the stocks composing the Dow Jones 
and the S&P100 indices, and another set of 116 stocks. 
First of all, we give the magnitude order of the interac- 
tion strengths and of the empirical pairwise correlations 
in Fig-Til 

'=ftfc.,1 nihil 
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J ij Xij 
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So, we interpret "H(s) as the opposite of the so-called 
utility function U(s) = — H(s) with pairwise interact- 
ing and idiosyncratic parts. Consequently the interaction 
stren gth s can be viewed as the incentive complementar- 
ities |ll, Q]|. Indeed we have d 2 U/dsidsj — Jy.The 
larger JijSiSj is, the stronger the strategic interaction 
between i and j is. 

We emphasize that this Ising like model is forced upon 
us as the statistically consistent model with the measured 
orientations and correlations. It is not an analogy based 
on specific hypotheses about the market dynamics. 

III. CONSISTENCY OF THE PAIRWISE 
MODELING 

One of the most exciting features of the model is the 
emergence of collective behaviors even if the interactions 
are weak. If the model is able to explain the recorded 
data, the system is therefore dominated by pairwise cor- 
relations. The aim is to provide quantitative empiri- 
cal evidence that the pairwise modeling is a consistent 
paradigm to explain the financial data and exhibited be- 
haviors in the market. In the following, we apply the 
pairwise model to a set of six major market indices (AEX, 
Bel-20, CAC 40, Xetra Dax, Eurostoxx 50, FTSE 100). 
We selected only European indices because some finan- 
cial issues are specific to Europe and we consider indices 
because they are the driving force of the respective stock 
markets [20|, they will reflect the main properties of the 
subjacent stock set. We will say that they arc up or 
bullish if the closing price is higher than the opening price 
and they are down or bearish if not. These will constituc 



FIG. 1. (a) maximum entropy distribution of the interaction 

strengths J and (b) empirical distribution of the pairwise 
correlations obtained from the collected data. 

The Jij are all positive; we can therefore use net mean 
orientation (net magnetization) as an order parameter. 
The mean value of hi is about 0.0113. 

As mentioned above, higher-order interactions can be 
involved in the interaction structure. In order to show 
that pairwise correlations are prevailing, we compute 
the Kullback-Leibler (KL) divergence, Dki j (P 2 ||-Pdata) 
between the two-agents maximum entropy (ME) distri- 
bution P2 and the empirical one Pdata- The KL di- 
vergence is equal to 2.27 x 10~ 2 for the ME distribu- 
tion inferred from 2253 observations. It must be com- 
pared to -Dkl(-Pl ||-Pdata) = 1-4801 for the independent 
agents model Pi. The closer to zero this quantity is, 
the closer P2 to Pdata is. Specifically, a consistent 
way to test if the pairwise correlation model satisfacto- 
rily explains data statistics is to evaluate the ratio be- 
tween S(Pi) — S(P2) and the Kullback-Leibler discrep- 
ancy In = Dkl(Pn\\Pi), where S(P2) is the entropy 
of the pairwise model. If this ratio is close to 1, the 
pairwise correlations explain most of available informa- 
tion. Indeed the multi-information Tjy = S(Pi) — S(Pn) 
measures the total amount of correlations in the system 
[22| . In this application, we obtain Ij/In — 98.5%. The 
pairwise correlations model is effective since it explains 
almost all the available information; only 1.5% of infor- 
mation is due to higher-order interactions. 

As a further test of the pairwise model consistency, we 
show below that this statistical model is able to recover 
the observed empirical moments. We compare the av- 
erage index orientations qi = T _1 Y^t=i S M obtained by 
simulation to the real ones. We simulated the process by 
doing 1 x 10 5 equilibration Monte Carlo time steps (MCS) 
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and we take the average on the next 2 x 10 7 MCS in order 
to reduce the variance of the estimator. The flipping at- 
tempts are simulated by the Glauber dynamics. Namely, 
we take an entity i chosen randomly and the attempt to 
flip the associated binary variable is performed with a 
rate depending on the exponential weight, the other ori- 
entations remaining fixed [23| . We take the time average 
for each index from the data and we compare it to the 
value obtained with the simulation; they arc illustrated 
in Fig-H 




FIG. 2. (Color online) Comparison of simulated orientations 
and the actual ones. The straight line shows equality. 

The root mean squared error (RMSE) is equal to 
7 x 10~ 4 , which represents 1.5% of the root mean squared 
(RMS) value of the six arithmetic means (equal to 
4.90 x 1CT 2 ). We recover quantitatively the average ori- 
entation of the six indices on the observation period. 
Moreover, since we obtained the probability distribu- 
tion, we can compare the correlation coefficients result- 
ing from the sampling of the proposed probability dis- 
tribution to the empirical ones. We sample the prob- 
ability law p 2 (s, J ME ,h ME ) by a Monte Carlo Markov 
chain (MCMC). We take 1.2 x 10 6 equilibration steps and 
1.2 x 10 4 independent sampling steps between each sam- 
ple. Fig-[3] illustrates the recovered correlation coefficients 
with the maximum entropy estimation versus the empir- 
ical ones. The results for only 130 observations (chosen 
arbitrarily corresponding to half a year) are conclusive. 
Indeed the RMSE represents 8.3% of the RMS value and 
the correlation coefficient of the empirical and simulated 
values is equal to 0.963. Including more observations 
(2258 trading days) allows us to reduce the dispersion in 
the results (correlation coefficient of the empirical and 
simulated values equal to 0.997; the RMSE represents 
1.8% of the RMS value). We note that it is effective even 
with few data. 

We perform the same work for the Dow Jones and 
the S&P100 indices (2500 configurations observed from 
10/10/2001 to 02/08/2011). We also consider 116 stocks 
from the New Y ork Stock Exchange available on the 
Onnela's website ( |http: / /jponnela.com/[ ) extending from 
the beginning of 1982 to the end of 2000 (4800 trading 
days). For these larger stock sets, the exact entropy max- 
imization (fT]) is not computationally tractable. There 
are several approximate inversion methods to estimate 
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FIG. 3. (Color online) Recovered correlation coefficients from 
MCMC versus empirical ones. The straight line shows equal- 
ity. The result based on 130 observations (left) and the result 
based on 2258 observations (right). 

the parameters. The mean field methods (naive, TAP 
and Tanaka's inversion see fl6l . HH, [25[) are the faster 
ones and they are accurate if the interaction strengths 
are weak (the weakness will be investigated in a further 
work). These methods will be used in the investigation 
of the structure evolution due to their reasonable accu- 
racy and quickness. Two other valuable inference meth- 
ods are minimum probability flow (MPF) (2(| and regu- 
larized pseudo- likelihood maximization (rPLM) [27| . In 
our application the rPLM method performs best. The re- 
sults for the first and second recovered moments (2 x 10 6 
equilibration MCS, values estimated on 2 x 10 7 samples 
recorded each N MCS) are illustrated in Fig-TJ]and Fig-[5] 




FIG. 4. (Color online) Comparison of simulated orientations 
and the actual ones. From left to right: DJ, S&P100 and 
Onnela's set. The straight line shows equality. 
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Empirical covariances 



FIG. 5. (Color online) Recovered covariances versus empirical 
ones. From left to right: DJ, S&P100 and Onnela's set. The 
straight line shows equality. 

The correlation coefficient between the recovered and 
empirical values is respectively 0.998, 0.996 and 0.997 for 
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the net orientations illustrated in Fig-|4]and 0.989, 0.964, 
0.997 for the covariances illustrated in Fig-[5]which shows 
the strong linear statistical relation between the empiri- 
cal and the recovered values. The relative deviation be- 
tween the RMSE and the RMS values is respectively 2%, 
7% and 6% for the net orientations and 9%, 17%, 8% for 
the covariances. 

We have seen that, in addition of the multi-information 
criterion, the net orientations and the covariances are re- 
covered from this model even with few data. We con- 
clude that the proposed pairwise interaction structure is 
a trustful one; this means that interactions are believed 
to be pairwise and symmetric ones and that they cause 
correlations. 



IV. ORDER-DISORDER TRANSITION 

As the previous pairwise model describes market in- 
dices quantitatively, we expect to observe an order- 
disorder transition in this system; we give below some 
empirical evidence that these transitions actually appear. 
As the interaction strengths are all positive, the system is 
ordered if the net orientation distribution has two modes 
near the extreme values —1 and 1 and disordered if the 
distribution has a unique mode. Indeed in an ordered 
situation, each index tends to have the same orienta- 
tion as the others. Furthermore, in the absence of ex- 
ternal influences, both extreme values are equivalent (as 
a consequence of the symmetry under sign exchange), 
and the distribution is thus bimodal. One of the ex- 
treme values can be favored following the values taken 
by the external influences hi- It will be a first clue that 
the system is reorganized if the distribution changes in 
such a way (having two modes and then a unique one, 
and reciprocally). We compute the system net orien- 
tation q(r, At) = (At N)^ 1 J2iJ2t=r * S M on successive 
periods At of 25 trading days (without overlapping), and 
we show that the net orientation probability distribution 
can be bimodal or not on successive time windows. The 
resulting empirical distributions for observations from 5 
November 2010 to 30 March 2011 are illustrated in Fig-H 
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FIG. 6. Empirical probability distribution of the net orienta- 
tion on four successive periods, each of 25 trading days. Time 
goes from left to right. The last time window corresponds to 
the irregularity induced by the Fukushima nuclear accident. 

In Fig-[S]we see that the empirical probability distribu- 
tion has initially two modes at extreme orientation values 
then has no clear mode, and finally again has two modes. 



During this period, initially the indices move in an orga- 
nized way then in a disorganized fashion, and finally the 
Fukushima nuclear accident caused a large global market 
fall followed by a large recovery. During this event, the 
indices were in comovement. So the system is initially 
ordered then disordered for two periods and then again 
ordered. 

Another way to characterize financial irregularities is 
to study the entropy S(s) on a sliding window (here, 
300 trading days shifted by 1 day). We compute the 
mean-field approximation of the entropy on those time 
windows (much faster than the exact computation). The 
mean-field entropy [28[ is 

N 

Smf(s) = - 2_ ln (— 2~ ) + M— 5— ) 

i=l 

. (6) 

The entropy is maximal when the average orientations, 
computed on the corresponding time window, are equal 
to zero and is minimal when the indices have the same 
orientation. During a disordered period, the entropy 
should be large and during a synchronized (ordered) pe- 
riod the entropy should be low. We should thus observe 
entropy minima simultaneously to orientation extrema 
(bubbles or crashes). We check in the results illustrated 
in Fig-0 that orientation extrema and entropy minima 
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FIG. 7. The normalized sum of indices (full line), the nor- 
malized net orientation (dashed-dotted line) and the normal- 
ized mean-field entropy (dashed line). The curves have been 
smoothed. The last major crisis is pointed out by an arrow. 
The shaded portions show orientation extrema and entropy 
minima. 

We observe large falls of the entropy when the net ori- 
entation is much larger than its mean (the mean is set to 
zero in Fig-[7]). The shaded portions show the orientation 
extrema and entropy minima on this time window. They 
correspond (chronologically) to the end of the growth pe- 
riod and the end of the collapse. Furthermore the corre- 
lation coefficient of the net-orientation and the financial 
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time series is equal to 0.82 showing a high degree of lin- 
ear statistical dependency. We conclude that the entropy 
minima are thus related to financial irregularities (large 
upward or downward movements). 

This is an empirical evidence that order-disorder tran- 
sitions occur in markets. This interpretation is supported 
by the recent results obtained in [l|, where the authors 
showed that market irregularities present a high degree 
of synchronization, meaning an ordered state. The eco- 
nomic consequence is that the whole market is correlated 
when such transitions occur. It also means the absence of 
a characteristic scale for the fluctuations and the emer- 
gence of power-laws. 

In appendix, we illustrate in Fig-THl a larger version of 
Fig-0 



V. DYNAMICS OF INTERACTIONS 

Linked to the above, such a transition occurs if the 
stochasticity changes or the interaction strengths change. 
A possible interpretation of time-varying interaction 
strengths is that some learning or adaptive process takes 
place through time. This means that the market adjusts 
the interactions between its entities in some adaptive pro- 
cesses so the { Jij, hi} are time dependent. The reason is 
that the background, namely worldwide economic condi- 
tions, changes through time and goes through economic 
fluctuations with contractions (recessions) and expan- 
sions (growths). As the correlations are explained by the 
pairwisc interactions, it also means that the correlations 
to be do not necessarily match past correlations. 

Following this interpretation, we expect that the tem- 
poral behaviors of the interaction strengths and external 
influences are related to market evolution. This is indeed 
true, as we will see below. First of all, we study the pref- 
erence evolution of the six previous indices (reflecting the 
current state of the European economy) and its link to 
the crises. We use a sliding temporal window of width 
T = 200 trading days shifted by a constant amount of 
At = 2 trading days. We show that the aggregate pref- 
erence h = ^2 i hi is negative during a crisis (or during a 
significative contraction) as illustrated in Fig-[8] 

The first negative incursion corresponds to the 2002- 
2003 crisis and the second one to the 2008-2009 crisis 
plj . As expected the external influences are decreasing 
when the market is subject to a crisis. 

More interestingly, we will study the spectrum of the 
interaction matrix. Indeed the spectrum evolution will be 
related to the market evolution. The spectrum of the in- 
teraction matrix of a stock set has an interesting feature; 
we will show it for the Dow Jones index. We collected 
data for the Dow Jones index from the 10 October 2001 
to 1 August 2011 [lH, and we extract the interaction 
strengths using the third-order approximation described 
in [25|. The trace of the interaction matrix, the sum of 
its eigenvalues, has the following interesting property. It 
decreases during a crisis; specifically, the trace minus its 
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FIG. 8. The aggregate preference (dashed line) and the nor- 
malized sum of indices (full line); both curves have been 
smoothed. The last two major crises are pointed out by ar- 
rows. 



temporal average becomes negative if there is a substan- 
tial fall of the index, this feature is illustrated in Fig-[9] 
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FIG. 9. The normalized Dow Jones index is plotted as a full 
line; the trace minus its temporal average is the dashed line. 
We used a sliding temporal window of width equal to 200 
trading days shifted each time by 5 trading days. 

The trace of the exact interaction matrix should be 
zero (without self- interactions) but, with the Tanaka's 
diagonal trick, the diagonal entries are related to the 
second-order term and to a part of the third-order of 
the Plefka series [HI, [25[ . The second-order term of the 
Plefka series is negative, the sign of the third-order term 
depends on the product of the interaction strengths. The 
temporal variation of the trace reflects the temporal vari- 
ation of these second and third order terms. These terms 
are particularly important near a transition. This ex- 
plains why the trace of the interaction matrix is smaller 
than its mean value during a crisis. Indeed during a 
crisis all the stocks act in similar way: they fall down. 
They thus have similar mean orientation (down) and the 
resulting system state is an ordered one. Before the cri- 
sis, during a common market growth or steady state, the 
price of some stocks rises (on average) and some oth- 
ers fall leading to a dispersion of the mean orientations. 
This is indirect evidence of a transition from one regime 
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to another and of coordination. It is consistent with the 
results obtained above and in [l|, I29I ] . In appendix, we 
illustrate in Fig-flUl a larger version of the Fig-[5J 



VI. LINK TO THE GRAPH-THEORETIC 
APPROACH 

Hereafter, we make the link with the previous spec- 
trum feature and the observation that the length of the 
minimum spanning tree (MST) based on the Sornette- 
Mantegna distance decreases during a crash [3(| l3lj |. 
meaning that stocks are highly correlated during these 
events (as they should be in an order-disorder transition). 
We will see that we recover this feature with the pairwise 
model with a distance based on interaction strengths in 
place of correlation coefficients. Indeed the interaction 
matrix can be thought of as the weight matrix of an undi- 
rected complete graph. Using a modified version of the 
method proposed in [32| and computing the minimum 
spanning tree length L(t) (the sum of the edges weights 
of the MST), we also observe that this length decreases 
during a crash, as expected; the results for the Dow Jones 
index are illustrated in Fig-fTOl 
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FIG. 10. The normalized Dow Jones index is plotted in full 
line, the relative difference to the time average of the length 
lit) — [L(t) — (L)] I (L) is the dashed line (where the brackets 
denote the temporal average). We use a sliding window of 
100 trading days shifted by 10 trading days each time. 

Moreover, it also allows cluster identification. Indeed, 
it is known that the asset tree based on the Sornette- 
Mantegna distance allows regrouping some stocks in clus- 
ters following their economic sectors [3l| . As the correla- 
tions are caused by the interactions, it is not surprising 
that the MST of the network defined by the interaction 
matrix also allows cluster identification. This approach 
has the advantage of not being limited to linear or mono- 
tonic statistical dependencies. The clustering feature is 
illustrated in Fig-fTTl 

We note that General Electric (GE) is not the most 
connected node but it is a cental one in the sense that 
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FIG. 11. (Color online) The minimum spanning tree based on 
the interaction matrix J is estimated on 2500 trading days. 
The companies are denoted by their ticks; they can be found 
on any financial website [Google finance for instance). 



it appears in three different clusters, as such it is still 
considered as the root of the MST and defines the gen- 
erational direction. This approach provides a different 
classification as given in [3l| or given by the Forbes for 
instance. Indeed, Forbes classification is given by sector 
then by industry. Disney and Walmart are classified in 
the same sector, services; this category is too vague to 
be an useful tag. Similarly, General Electric is tagged 
by Forbes as industrial goods and then as diversified ma- 
chinery but this company also provides financial services, 
aircraft engines, TV channel broadcasting, etc. It is then 
clear that this company should be classified with more 
than one tag, as does the proposed method. In this point 
of view, the internal structure of each company seems to 
be the crucial information to identify stock clusters. 

We can also study the topological structure of the re- 
maining asset tree during a crash and a growth period. 
We will see that, as expected, the degree distribution fol- 
lows a power law. We consider the stocks of the S&P100 
index on two intervals, from 1/10/2007 to 01/02/2009 
(360 trading day crisis period) and from 1/02/2005 to 
1/07/2007 (600 trading day growth period). The occur- 
ring frequencies of the vertex degrees are illustrated in 
Fig-QH 

Thid reveals that the degree distribution is a power 
law, fin) ~ n~ Q , and the value of the exponent is similar 
for the both periods. For the growth period, we obtain 
a = 1.64 ± 0.17 and during a crash a = 1.58 ± 0.12. 
They can be included in the confidence interval of each 
other, so they are very similar. The maximum degree is 
n = 8 in the both periods. They are 58 vertices of de- 
gree n = 1 during the crash. This value is slightly larger 
(about 10%) than the one corresponding to the growth 
period, 52 vertices of degree n = 1. This explains the 
difference between both exponents. The asset tree topol- 
ogy is thus slightly different during a crash. The main 
change is the variation of the interaction strengths (the 
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FIG. 12. The degree distributions during a growth period 
(left) and during a crash (right). The solid line is a power- 
law fit; the coefficients of determination are respectively 0.98 
and 0.93. 



hibit some properties observed in magnetic materials and 
in neural networks. Indeed, we showed that an order- 
disorder transition occurs in such a system, as described 
by a pairwise model equivalent to the Ising model. Fur- 
thermore, we showed that the interaction strengths are 
time dependent meaning that an adaptive process oc- 
curs and that they are the starting point of the graph- 
theoretic approach of the market. In this view the system 
is more than the sum of its parts, is ruled by its enti- 
ties pairs, exhibits collective behaviors and is quantita- 
tively described by a pairwise model. It is surprising that 
such sophisticated collective behaviors, emergent struc- 
tures and underlying complex trading rules are captured 
by a simple (a priori) scheme of interdependence involv- 
ing only pairwise but no higher-order interactions. 



graph weights) rather than the variation of the vertex de- 
grees. In both regimes, the asset trees are thus scale-free 
networks. This implies that the edges are not drawn at 
random and the asset trees exhibit small- worldness, as 
observed with another method in Q. Furthermore, the 
low value of this exponent implies that hubs (high-degree 
vertices) represent a significant part of the total numbcr 
of vertices. The market is thus sensitive to the failure of a 
hub (a highly connected company) whereas the failure of 
a leaf (terminal node) will only slightly affect the market. 
By example the hypothetic failure of the American Ex- 
press Company (AXP) would leave a fragmented market 
whereas the bankruptcy of Kraft Food Inc. (KFT) would 
not change the topology of the asset tree significantly; see 
Fig-fTTl This could help in selecting the companies one 
has to save from an eventual bankruptcy in order to min- 
imize the impact of such an event. This could also help 
to select which companies one has to monitor to prevent 
a hypothetical dramatic system failure. 

VII. CONCLUSION 

We have seen that, without making assumptions on the 
market dynamics, the maximum entropy principle pro- 
vides a rigorous pairwise model which is able to describe 
the data and the observed collective behaviors quanti- 
tatively. We showed that including higher-order inter- 
actions does not explain more than using the pairwise 
model, and thus that the collective phenomena emerge 
from simple pairwise interactions. To confirm this result, 
we showed that this statistical model is able to recover 
the empirical moments computed from the data, espe- 
cially the mean orientations and the correlations. The 
success of the pairwise model implies that markets ex- 
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A); the trace minus its temporal average is the dashed line 
(curve B). We used a sliding temporal window of width equal 
to 200 trading days translated each time by 1 trading day. 
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